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Abstract 

The information-spectrum analysis made by Han for classical hypothesis testing for simple hypotheses is extended to a unifying 
framework including both classical and quantum hypothesis testing. The results are also applied to fixed-length source coding 
when loosening the normalizing condition for probability distributions and for quantum states. We establish general formulas for 
several quantities relating to the asymptotic optimality of tests/codes in terms of classical and quantum information spectra. 

Index Terms 

Information spectrum, Quantum hypothesis testing, Classical hypothesis testing, Fixed-length source coding, Optimal exponent 



I. Introduction 

One of the principal aims of information theory is to establish a link between two different kinds of quantities. One is an 
operational quantity which is defined as the optimal or limiting value of a concrete parameter such as code length, compression 
rate, transmission rate, convergence rate of error probabilities, etc. The other is an information quantity such as the entropy, 
\q ' divergence, mutual information, etc. Note that the latter, in its definition, is more abstract than the former, and the meaning 
, of the latter is usually clarified by linking it to the former. In the so-called information spectrum method which first appeared 
in a series of joint papers of Han and Verdu (e.g., [1], [2]), the process of establishing such a link is intentionally divided 
, into two parts by introducing a third kind of quantity — information spectrum, putting it between an operational quantity and 
' an abstract information quantity. This setting allows us to pursue many problems of information theory in their most general 
Qh, forms; see [3] for the whole perspective of the method. 

> For instance, let us consider two sequences of random variables X = {X n }^ D =1 and Y = {Y n }^ =1 , where X n and Y n for 
^ , each n are supposed to take values in a common discrete (finite or countable^ set X n subject to probability distributions (mass 
^ ' functions) Px" and Py™ respectively Note that X n does not need to be the product set X x ■ • • X X of an X, although the 
q~1 notation suggests that the product set is a representative example of X n . Han [3], [4] studied the hypothesis testing problem 
L" ' for the simple hypotheses consisting of the general processes X and Y by means of the information spectrum, which is the 
i— j , asymptotic behavior of the random variable £ log \x n ) *- or n p*™ ) * n tn ^ s case - He succeeded in representing 



several asymptotic characteristics of hypothesis testing in terms of the information spectrum with no or very few assumptions 
on the processes. The term 'spectrum' is intended to mean that the scope of the theory covers the general case when the 
probability distribution of i log p*"[x") does not necessarily get concentrated at a point, but may spread out, as n — > oo. 

The purpose of the present paper is to extend, complement and refine Han's analysis of hypothesis testing from several 
viewpoints. The biggest motivation comes from the question of how to extend the analysis to quantum hypothesis testing. 
Following the above setting, we are naturally led to consider the problem of hypothesis testing for the simple hypotheses 
consisting of two sequences of quantum states p — {p n }^Li and cr = {a n }%L lt where p n and a n are density operators on 
a common Hilbert space H n for each n. However, it is by no means obvious whether a similar analysis to that of Han is 
applicable to the quantum setting. We show in this paper that it is actually possible to extend Han's results by appropriately 
choosing a quantum analogue of the information spectrum so that both the classical and quantum cases are treated in a unifying 
framework. Although this does not mean that application to a special class of quantum processes such as i.i.d. (independent 
and identically distributed) ones immediately yields significant results, it seems to suggest a new approach to studying the 
quantum asymptotics and to elucidating a general principle underlying classical/quantum information theory. 

Hiroshi Nagaoka is with Graduate School of Information Systems, The University of Electro-Communications, 1-5-1, Chofugaoka, Chofu-shi, Tokyo, 
182-8585, Japan (e-mail: nagaoka@is.uec.ac.jp). 
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1 In this paper we only treat the discrete case to simplify the description when considering the classical hypothesis testing, although it is straightforward 
as pointed out in [3], [4] to extend the argument to the general case where {X n } are arbitrary measurable spaces. 
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It should be noted that, even though the statements of our theorems are almost parallel to those for the classical setting, 
some of the proofs are essentially different from the original proofs of Han. The technique of information-spectrum slicing, 
which was effectively used in [3], [4], [6] to prove several important theorems, consists of a procedure of partitioning a set and 
does not straightforwardly apply to the quantum setting. We are thus forced to look for another idea for proofs. Fortunately, 
we have successfully found a way which does not need information-spectrum slicing and is applicable to the quantum setting. 
Moreover, the new proofs are much simpler than the original ones even in the classical case. This simplification is a byproduct 
of our attempt to pursue quantum extensions. 

This paper also contains results such as those of Theorems [4] and Q which improve the corresponding original theorems 
when applied to the classical setting. In addition, from the beginning, we treat generalized hypothesis testing in the sense of 
Han [3], [4], namely that the alternative hypothesis Py™ can be any nonnegative measure. This enables us to unify hypothesis 
testing and fixed-length source coding in a natural way. 

This paper aims at presenting a unifying framework to treat the classical and quantum generalized hypothesis testing problem 
in the most general and simplest manner. After presenting the notation in section [UJ the concept of information spectrum is 
introduced in section [Til] for both classical and quantum cases. In sections ITVl IVII and IVIII various types of asymptotic bounds 
on the hypothesis testing problems for classical and quantum general processes are studied, basically following the problem 
settings and the notation given in [3], [4]. In section[V]we make some observations on Stein's lemma for classical and quantum 
i.i.d. processes in the light of the results of section [TV] Applications to the classical fixed-length source coding are presented 
in section IVIIII and concluding remarks are given in section [IX] 

II. A UNIFYING DESCRIPTION OF CLASSICAL AND QUANTUM GENERALIZED HYPOTHESIS TESTING 

In this section we present a common language to treat classical and quantum hypothesis testing and fixed-length source 
coding in a unifying manner. We begin by considering the classical case. Suppose that we are given a sequence of discrete 
sets X = {X n } n % 1 , a sequence of probability measures p — {p n }n%i and a sequence of nonnegative (not necessarily 
probability) measures a = {(J n }^=i, which are represented by mass functions p n : X n — > [0, 1] with Yl x eX n Pn( x ) = 1 an d 
er„ : X n — > [0, oo). In the usual hypothesis testing problem, both p n and a n are probability measures denoted as p n = Px^ 
and a n — -Py-n . On the other hand, a n should be taken to be the counting measure on X n when considering the source coding 
problem (see [3], [4] and section IVIIII below). For a function (random variable) A : X n — > M, we write 

Pn[A}= ^2 p n (x)A(x) and a n [A] = a n {x)A(x). 

Let T n be the set of [0, l]-valued functions defined on X n . When both p n and a n are probability measures, we regard an 
element T n of T n as a randomized test for the simple hypotheses {p n , a n } by interpreting T n (x) (G [0, 1]) as the probability of 
accepting the hypothesis p n when the data x is observed. In particular, a deterministic test is an element of T n taking values in 
{0, 1}, which is the characteristic function of the acceptance region {x £ X n \ T n (x) — 1} for the hypothesis p n . Depending 
on whether the true distribution is p n or cr„, the probability of accepting the hypothesis p n turns out to be p n [T n ) or a n [T n ], 
and the error probabilities of the first and second kinds are represented as 

a n [T n ] = l-p n [T n ] and p[T n ] = a n [T n }. (1) 

In the general situation where a n is an arbitrary nonnegative measure, we still call elements of T n tests and use the same 
notation as in ((T). Letting / and denote the constant functions on X n such that I{x) = 1 and 0(x) = for all x S X n , a 
test T n is characterized as a function such that < T n < I, where, and in the sequel, we write A < B for functions A and 
B when A(x) < B(x) for all x. 

Let us turn to the quantum case. Suppose that a sequence of Hilbert spaces H — {H n }^Li is given. Let p — {p n }%Li 
be a sequence of density operators (i.e., nonnegative self-adjoint operators with trace one) on {Tin], and <r = {<Tn}%Li be a 
sequence of bounded nonnegative self-adjoint (but not necessarily density or trace-class) operators on {H n }. For a bounded 
self-adjoint operator A on H n we write 

p n [A] = Tr (p n A) and a n [A] = Tr {a n A). 

Let T n be the set of self-adjoint operators T n satisfying < T n < I; i.e., both T n and / — T n are nonnegative with / denoting 
the identity operator on H n - An element T n of T n can be considered to represent a {0, l}-valued measurement on TL n by 
identifying it with the POVM (positive operator-valued measure) {T„(0), T„(l))} = {T n , I — T„}, and is called a test for the 
hypotheses {p n , <j n } with the interpretation that the measurement result means the acceptance of the hypothesis p n . 

We define a n [T n ] and f3 n [T n ] by the same equations as (Q}, which turn out to be the error probabilities of the first and 
second kinds when both p„ and a n are density operators. 

We have thus reached a common setting to treat generalized hypothesis testing of classical and quantum systems for which 
sequences 

p = { Pn }™ =1 , * = {<Tn}% = i, and f = {T„}- =1 
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are given. Note that 

= Pn [0] < Pn [T n ] < Pn [T^] < p n [I] = 1 (2) 

and 

= a n [0] < a n [T n ] < a n [T' n ] < a n [I] < oo (3) 

always hold for any tests T n ,T' n S T n such that T n <T' n . We shall work with this setting throughout this paper. 

Remark 1: Readers who are familiar with the language of operator algebras may immediately extend the setting to a more 
general one in which we are given a sequence of a certain kind of *-algebras A = {A n }^ = i containing the identity elements 
/, a sequence of states p = {p n }'^Li (linear functionals mapping nonnegative elements to nonnegative numbers and the 
identity elements to 1) and a sequence of positive weights <r = {<r n }^Li (linear functionals mapping nonnegative elements to 
nonnegative numbers or oo), with defining T n = {T n e A n | < T n = T* < I}. The classical case and the quantum case 
treated above correspond to A n = L 00 ^™) (the set of complex-valued bounded functions on X n ) and A n = B(H n ) (the set 
of bounded operators on TC n ) respectively. 

The following notation is introduced in order to represent several variations of error exponents in a unifying manner. Given 
a sequence of tests T = {T n }™ =1 such that T n e T n (Vn), let 

Vn[T n ] log a n [T n ] =-ilog(l-p[T n ]), 

Cn[r„] d =-ilog/3„[T„] =-hoga[T n \, 

and 



a[f] 


def 


lim inf a n [T n ], 

n — >co 


a[f] 


def 


lim sup a n [T n 

n — *oo 


§[f] 


def 


liminf/3„[T„], 

n^oo 


p[f] 


def 


lim sup (3 n [T n 


R[T] 


def 


lim inf rj n [T n ], 

n— >oo 


m 


def 


limsup?7„[r„; 

n^oo 


C[T] 


def 


lim inf ( n [T n } 7 


C[T] 


def 


lim sup (n[T n ] 



When T = {T n } is replaced with its complement T c = {T n c = f / — T n }, we add the superscript c to these symbols as 

<[T„] = a n [T n % r,Z[T n ] - Vn [T n % ( C [T] = ([T% etc. 



III. Information spectrum and likelihood tests 

As mentioned in the introduction, the information spectrum for classical hypothesis testing is the asymptotic behavior of 
the random variable 

„ def 1 . Pn (X n ) 

Z n = - log — — — , 

n a n (X n ) 

where X n is supposed to be subject to the probability distribution p n . Han [3], [4] called Z n the divergence-density rate and 
derived several formulas for representing the asymptotic characteristics of the classical hypothesis testing problem in terms 
of the information spectrum. Now we are led to the following question; what is the quantum analogue of the information 
spectrum? At a first glance, it may seem to be natural to consider the quantum observable represented by the self-adjoint 
operator -(logp„ — log<r n ) and its probability distribution under the quantum state p n - Unfortunately, this line is not directly 
linked to the hypothesis testing problem. We give up seeking the quantum analogue of Z n , but instead seek that of a likelihood 
test S n (a) : X n -» [0, 1] obeying 

S(a)(x)-l 1 if ^W> >a (4) 
Sn{a){x) -\0 if Mog^< a (4) 

where a is an arbitrary real number. Note that there is an ambiguity in this definition of S n (a) when some x satisfies 



- log Pn ^ = a, including two special cases where S n (a) are the deterministic tests with the acceptance regions 



[,e^|il g^M>4 



and \x e X r > 



1 , p n (x) \ 

-log — y^- >a\. 
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In general S n (a) may be randomized with an arbitrary probability when the obtained data x satisfies i log ^" = a Denoting 
the characteristic functions of the sets {x | > c} and {a; | A(a^) > c} by {A > c} and {A > c} respectively, equation 
is rewritten as , 

-log^>a\<S n (a)<\-log^>a\, (5) 
or equivalently as 

{p n - e na a n > 0} < S n (a) < {p n - e na a n > 0} . (6) 

The family of tests {S l ra («)}aeK characterizes the information spectrum by Prob{Z„ .>. a} — p n [S n (a)}. 

In order to introduce the quantum analogue of S n (a), we need some preliminaries. For a self-adjoint operator A on a Hilbert 
space with the spectral decompositior@ A = A^-Bi, where {A^} are the eigenvalues and {Ei} are the orthogonal projections 
onto the corresponding eigenspaces, we define 



{A > 0} fe £ J] B, and {A > 0} d = f £ 

i:A;>0 i:A;>0 



These are the orthogonal projections onto the direct sum of eigenspaces corresponding to nonnegative and positive eigenvalues, 
respectively. The projections {A < 0} and {A < 0}, or more generally {A > B} = { A - B > 0}, {A < B} = {A - B < 0}, 
etc., are defined similarly. We have 

Tr (A{A > 0}) > 0, (7) 

and for any test T on H 

Tr (A{A > 0}) > Tr (AT). (8) 
The first inequality is obvious, while the second follows from < T < I as 

Tr (AT) = Tr (A{A > 0}T) + Tr (A{A < 0}T) 

< Tr (A{A > 0}T) 

< Tt(A{A>0}). 

Note that {A > 0} in and ([8]) can be replaced with {A > 0} or, more generally, with any self-adjoint operator S satisfying 
{A > 0} < S < {A > 0}. 

Now, in the quantum setting where a sequence of density operators p — {p n } and that of bounded nonnegative self-adjoint 
operators er = {cr„} are given, let S n (a) be a self-adjoint operator satisfying the same equation as (O. Since S n (a) satisfies 
< S n (a) < I, it is a test in our sense. Indeed, it is the quantum analogue of the likelihood test introduced by Holevo 
[7] and Helstrom [8] when <j n is a density operator. Note that © is not equivalent to © in the quantum case unless p n 
and <r n commute. As in the classical case, there is an ambiguity in the definition of S n (a), including two special cases 
Sn(a) = {Pn — e na (j n > 0} and S n (a) = {p n — e na (j n > 0}. Some quantities defined in the sequel may depend on a choice 
of S n (a) within ©, but this will not cause any essential difference in the theorems represented in terms of these quantities. 
We sometimes write {p n — e na a n ( > 0} to mean S n (a), suggesting this ambiguity. 

From © and (O we have 

(p n -e na a n )[S n (a)] > 0, (9) 

and for any test T n 

( Pn - e na a n ) [S n (a)] > ( Pn - e na a n ) [T n ] . (10) 
In addition, letting S£(a) d = I — S n (a) — {p n — e na a n .< 0} we have 

(p n -e na a n )[S*(a)] < 0, (11) 
( Pn - e na a n ) [5 n c (a)] < (p n - e na a n ) [T„ c ] . (12) 



These are rewritten as 



and 



a n (a) + e na f3 n (a) < 1, (13) 

a n (a) + e na [3 n (a) < a n [T n ] + e na n [T n ], (14) 

a n (a)-e na (3^(a) < 0, (15) 

a n (a) - e na p:(a) < a„[T n ] - e na {3°[T n ], (16) 



2 We assume here that A has discrete eigenvalues since it suffices for our main concern and simplifies the description, although the assumption is not 
essential. 
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where 

a„ (a) d = ocn[S n (a)] = p n [{p n - e na a n { < } 0}] , 

(3 n (a) = f (3 n [S n (a)} = a n [{p n - e na a n ( > } 0}], 

0Z(a) d = /3„[S„ c (a)] = a n [{p n - e na a n < 0}] . 

Needless to say, these properties also hold in the classical case. In particular, the inequality (fl4l > in the classical case is the 
Neyman-Pearson lemma, whose quantum extension was given in [7], [8]. All the results in the later sections, including the 
classical ones obtained by Han, are derived only from the inequalities ( fT3] l through (116) . This fact may be one of the most 
important findings of the present paper. 

Let us see that a n (a) ((3 n (a), resp.) is monotonically nondecreasing (nonincreasing, resp.) as a function of a; i.e., if a < b 
then 

a n (a)<a n {b) and /3 n (a) > (3 n (b). (17) 

In the classical case, this is obvious because {x I — log Pn y\ > a\ D {x I — log Pnl f\ > b} if a < b. In order to show the 
monotonicity in the quantum case, we invoke ( TT~4-b to yield 

a n (a) + e na /3 n (a) < a n (b) + e na (3 n (b), 

a n (b) + e nb /3 n (b) < on(o) + e nb (3 n (a). 

These are rewritten as 

e na {p n (a) - I3 n {b)} < a n (b) - a n (a) < e nb {(3 n {a) - /?„(&)} , (18) 

which leads to (11711 . This monotonicity will be used implicitly throughout the later arguments. 
Let 

Vn(a) = f i]n[S„(a)} = loga n (a) 

n 

= --\og Pn [{p n -e na a n < 0}], 

n ( ' 

Cn(a) d = C n [S„(a)] = log/3„(a) 

n 

= --\oga n [{p n -e na a n > 0}], 

n y ' 

C„ c (a) = C„ c [5n(a)]=--log^(a) 
n 

= --loga n [{p n -e na a n( < 0}], 



and 



a(a) 


del' 


a[S(a)} = 


liminf a„(a), 

n — >oc 


a{a) 


def 


a[S(a)] — 


lim sup a n (a), 

n — >oo 


77(a) 


def 


r 2 [S(a)] = 


lim inf r) n (a), 

n — >oo 


C(o) 


def 


C[S(a)] = 


lim inf Cn(a), 

n — >oo 


c c («) 


def 


C C [S(a)] = 


= lim sup (a), etc 



where S(a) denotes the sequence {S n (a)}^ =1 . Note that ry„(a), (a), 77(a) and ^(a) are monotonically nonincreasing, while 
Cn(a) and £(a) are monotonically nondecreasing. In addition, since fl 1 3 b yields fl n {a) < e~ na , we have 

C(o) > a. (19) 
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IV. ASYMPTOTICS OF STEIN'S TYPE AND SPECTRAL DIVERGENCE RATES 

In this section we treat the following quantities: 

B(e\p\\9)^sup{({f}\a{f}<e} 
f 

= sup { R | 3T € f, a[f] < e and ([f] > R}, 

B\e\p\\9) d ^ S up{({f}\a{T] < e} 
f 

= inf { R | VT G f, if ([f] > R then a[T] > e}, 
where e is a constant lying in the interval [0, 1], and in particular 

i?(p||^) d = f B(0|p||<x) 
= sup{C[T]| lim a n [T n }=0} 

= sup{i?|3T , lim a n [T n ] = and ([f] > R}, 

n — >oc — 

B^p\\9)^B^l\p\\9) 

= sup{C[T] \a[T] < 1} 
f 

= M{R\yf, if ([f] > R then lim a n [T n ] = 1}. 

— n — >oo 

As will be seen in the next section, these quantities are the main concern of Stein's lemma in the classical i.i.d. case. Note 
that we formally have B(l \ p\\ 9) = oo and B^(0 \ p\\ 9) = — oo, although they are of no importance. Obviously, for any 

< £i < e 2 < 1 

B(p\\9)<B(e 1 \p\\9)<B(e 2 \p\\9), 
B\e 1 \p\\9)<B\e 2 \p\\9)<B\p\\9), 

and 

B(p\\ 9) < B{e x \ p\\ 9) < B\e 2 \ p\\ 9) < B*(p\\ 9). 
In addition, B(s \ p\\ 9) is right continuous for any < e < 1 in the sense that 

B(e\p\\ 9) = max{C[f] | a[f] < e} = inf B(e'\p\\ 9). (20) 

X — e'>e 

To show this, let {S k } k x L 1 be an arbitrary sequence of positive numbers satisfying lim^oo S k — 0. Then for each k there exist 
a test = {Ti k) } and a number n k such that a n {T^ k) ] <e + 2S k and Cn[^ fe) ] > B(e + S k \ p\\ 9) - 5 k for all n > n k . It 
is now easy to construct a test T such that a[T] < e and ([T] > inf e '> e B(e' \ p\\ 9), which proves (l20l . On the other hand, 
it is obvious that B^(e \ p\\ 9) is left continuous for < e < 1; 

B^(e\p\\9) = sup B^(e'\p\\ 9). (21) 



Next, let 



for < e < 1, and 



D(e | p || 9) = f sup {a \ a(a) < e}, 



clef 



D(e | p || 9) — sup {a \ a(a) < e} — inf {a \ a(a) > s} 



D_{p\\ 9) = f D(0 | p || 9) = sup { a | lim a n (a) = 0}, 

n — >oo 

D(p || 9) d = D(l | p || 9) = inf {a | lim a„(a) = 1}. 

n — > oo 

It should be noted that in the classical case when p n — Px™ and a n — Pyn we have 

D{p\\ &) = P-liminf ~ log ^" ' '^J 

and 

1 P (A^ n ) 
^(p H ^) = p- Umsup - log , 
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where p-liminf and p-limsup are the liminf and limsup in probability: 

p-liminf A n = sup{a| lim Prob{yl n > a} = 1}, 
p- lim sup A n = inf {a | lim Prob{A„ > a} = 0}. 

Actually, Z)(X|| Y) and D(X|| Y) were introduced in [3], [4] by these expressions and called the spectral sup- and inf- 
divergence rates between X and Y. 
It is clear that for any 0<£i<£2<1 

D{P II 3) < D(ei I P II 9) < D(e2 | p\\9), 
D( El \p\\9) <D(e 2 \p\\9) <D(p\\9), 

and 

R(p\\ 9) < D{ei I P II 9) < D(e 2 \ o\\ 9) < D(p\\ 9). 

In addition, when 9 = {<x„} consists of states (probability distributions in the classical case and density operators in the 
quantum case) as well as p — {p n }, it follows from ( fT5] l that 

a n (a)<e na (l-p n (a))<e na . (22) 

Hence we have lim^oo a n (a) — for any a < 0, which leads to 

D(p\\9)>0. (23) 

On the other hand, D.{p \\ 9) and D(p\\ 9) may be negative when {a n } are not states. 
Theorem 1: For every e £ [0, 1] 

B(e\p\\9) = R(e\0\\a), (24) 
B^e\p\\9) = D(e\p\\9). (25) 

In particular, we have 

B{S\\9) = R(p\\9), (26) 

B\p\\9) = D(p\\9). (27) 
Proof: Recalling that a n (a) = a n \S n {a)\ and using equation (fT9l l. we have 

B(e\p\\9) > sup {((a) \ a (a) < e} 

a 

> sup {a 1 75(a) < e} — D_(e \ p \\ 9). 
To show the converse inequality, suppose that a test T and a real number a satisfy 

a[T] < e < a (a). 

Note that we can assume with no loss of generality the existence of such an a, or equivalently the finiteness of D_(e \ p || 9), 
since the inequality is trivial otherwise. Then there exists a positive S for which a n (a) — a n [T n ] > 5 holds for infinitely many 
n's. Using equation ( TT4T > we have 

Pn[T n ] > e~ na {a n (a) - a n [T n ]} + (3 n (a) > e~ na 5 

for these n's, which implies that £[T] < a. This proves 

B(e\p\\9) < inf {a | a(a) > e} = D{e \ p \\ 9). 

Equation d24] > has thus been verified. We can also prove equation ( |25] | almost in the same way. ■ 
Remark 2: Equation (|24T > for the classical case was obtained by Han [3] as a slight modification of a result by Chen [9]. 
Equation (|26*| | was also described in [3], giving credit to Verdu [10] for the original reference. As was mentioned in [3] and 
is now obvious from d26l ) and d27T >. the equality I2(p|| 9) = D(p | 9) is necessary and sufficient for the so-called strong 
converse property to hold in the sense that a n \T n ] converges to 1 for any test T satisfying £[T] > B(p | 9). 
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V. Stein's lemma in the classical and quantum i.i.d. case 

In the classical i.i.d. case when p n (xi, . . . , x n ) = p{x\) ■ ■ ■ p(x n ) and a„(xi, . . . , x n ) = a(x\) ■ ■ ■ a(x n ), Stein's lemma 
(e.g., [11], [12], [13]) claims that 

B{ £1 1 p || a) = B\e 2 \ p\\ a) = D(p \\ a) (28) 

for < Vei < 1, < Ve2 < 1, where D(p \\ er) is the Kullback-Leibler divergence: D(p \\ er) — J2 X p(x) log A standard 
proof of the lemma uses a similar argument to the proof of Theorem [T] to reduce (l28l to 

D(p\\<r)=D(p\\*) = D(p\\a), (29) 

which is equivalent to 

lim - log • ■ LL^L = D(p \\ a) in probability, 

n^oo n a n (Xi,...,X n ) 

where X 1: . . . , X„ are random variables obeying the probability distribution p n . Now this is a direct consequence of the weak 
law of large numbers. 

Let us turn to the quantum i.i.d. case when p n — p® n and er„ = er® n , where p and er are density operators on a Hilbert 
space H., and define the quantum relative entropy by D(p \\ er) = Tr [p(logp — logo - )] (e.g., [14]). The achievability part 

B(e\p\\a) > D(p\\a) for < Ve < 1, (30) 

which is equivalent to B(p \\ a) > D(p \\a) by ( |20l , was first proved by Hiai and Petz [15]. They showed the existence of a 
sequence of POVMs M = {M^} on {TL® n } satisfying 

liminf -D MM (p^ n || a® n ) > D(p \\ a), (31) 

n — ^oo Ji 

where D M ( n) (p® n \\ a® n ) denotes the Kullback-Leibler divergence between the probability distributions P^,S (• ) = Tr {p® n M^( ■ )) 

and P% { 2 ] ( • ) = Tr (a^ n M^(- )). Since nD(p || a) = D{p® n \\ a® n ) > D U M {p® n \\ a® n ) follows from the monotonicity 
of relative entropy, this leads to 

D(p || a) - lim - sup D MW {p® n \\ a®"), (32) 

n^oo n M( „) 

which is often referred to as the Hiai-Petz theorem. Now it is easy to see that combination of (fjlj and the the direct part of 
the classical Stein's lemma leads to (l30l as is shown in [15]. Hayashi [16] gave another construction of {M^} satisfying 
OTb based on a representation-theoretic consideratioij§. Inequality (f30b can also be proved more directly, not by way of OTI) . 
in several different ways as shown in [17], [18] and Remark 20 of [19], the last of which also appears in Sec. 3.6 of [20]. 
Note that these proofs conversely yield the existence of M — {M W} achieving d3~lT > with the help of (l35l l below. 

On the other hand, the converse part B^(p\\ er) < D(p \\ a) was first shown in [21] by combining the quantum Neyman- 
Pearson lemma ( TPfl ) with the inequality 

Tr (p® n {p® n - e na a® n > 0}) < e -»{ as -V-(e)} (33 ) 



for < \/9 < 1, where ip{9) = logTr (p 1+9 a e ). A simpler proof was given in [22]. 



The quantum Stein's lemma has thus been established in the same form as ( 1281 1. In the quantum case, d29l is not a ground of 
\, as at present we do not have a quantum version of the law of large numbers which directly applies to d29b (even though 
only the inequality D(p \\ a) < D(p \\ a) follows immediately from (1331). Instead, d29l should be regarded as a consequence 
of ( 1281 . So we restate it as a theorem. 

Theorem 2: For arbitrary density operators p and a on a Hilbert space, we have D(p | <r) = D(p | er) = D(p || a) by letting 
p = {p 0n } and <x = {cr®™}; in other words, 



lim Tr (p® n {p® n - e na a® n > 0}) 



n^oc 



1 if a<D(p\\a), 
if a>D(p\\a). 
Example 1: Let us numerically illustrate this theorem for 



(34) 



0.75 0.35 
0.35 0.25 



and a 



0.9 
0.1 



which are density operators (matrices) on Ti. = C . The relative entropy in this case is D(p\\a) = 0.4013 • • • . The graph of 
the function g n (a) d = Tr (/?® n {p® n — e na cr®™ > 0}) for n = 5,15 and 50 is shown in Fig.l, where we can see that the 

3 More precisely, the papers [15] and [16] showed different theorems, both of which include fl3U as a special case; see [16] for details. 
4 As a consequence of Eq. (2.63) in [20], the inequality of {33) turns out to be true for V£> > 0. 
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slope of the graph around a = D{p\\a) gets steeper with increase of n, as equation fl34l ) suggests. It is noted that drawing 
the graph requires computing the spectral decomposition of the 2™ x 2™ matrix p®" — e na cr®™ for each a, which is too large 
to apply a direct method when n — 50. We have applied the theory of irreducible decomposition of the algebra generated 
by {A®' 1 | A G C 2x2 } based on the observation made in [16], [17], which reduces the problem to finding the spectral 
decompositions of \(n + l)/2] matrices whose sizes are at most (n + 1) x (n + 1). Details of the algorithm will be reported 
elsewhere. 




-1 -0.5 0.5 1 1.5 



Fig. 1. The graph of g n (o) 

Let us make some observations on the quantum Neyman-Pearson test S n (a) in connection with OTb . We begin by considering 
the general situation where p = {p n } and <j = {cr„} are arbitrarily given. For a sequence of tests T = {T n }, let Z>r„(Pn | c n ) 
denote the Kullback-Leibler divergence of the resulting probability distributions (p n [T n ], 1 — p„[T„]) and (cr n [T n ], 1 — a n [T n ]). 
Then we have 

D T n (Pn II &n) 
= - h(p n [T n ]) ~ p n [T n ] log a n [T n ] 

- (1 - p„[T n ])log(l - <r„[T n ]) 
>- log 2 -pn[T n ] log a n [T n ], 

where h is the binary entropy function; h(t) = — tlogt — (1 — t) log(l — t). This proves that 

lim inf -D Tn {p n || a n ) > Q[f] if lim a n [T n ] = 0. (35) 

In particular, we have 

liminf -D s r a )(Pn II <?n) > a if a<D(p\\a), (36) 

n — >oo ji 

where we have used ( fl9] l. Applying this to the i.i.d. case where p n = p®" and a n = cr® n , and recalling Theorem [2] we have 

liminf ~D S (a) (p® n \\ o® n ) >a if a < D(p\\ a). 

Therefore, if a number sequence {a n } is chosen so that a n converges to D{p \\ a) monotonically from below with a sufficient 
slow speed, then 

liminf -D SM (p^ n || a®") > D(p\\ a), (37) 

n — *oo 12 

which gives an example of (l3Tl i. 

Remark 3: From (l36l l and the monotonicity of quantum relative entropy, we obtain the general inequality 

liminf -D(p n || <r„) > £(p|| <x), 

which has also appeared in [19]. 
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Remark 4: Recently an extension of quantum Stein's lemma was reported under the name of "a quantum version of Sanov's 
theorem" [23]. Relating to this work, let us make some remarks on the relation between Stein's lemma and Sanov's theorem 
(see e.g. [13]). These theorems are similar in that both of them represent the convergence rates of some probabilities in terms 
of relative entropy. Moreover, they are closely related to each other in their logical derivations. Nevertheless, we should note 
that they have their respective roles in different contexts in general; Stein's lemma is about hypothesis testing and Sanov's 
theorem is a fundamental theorem in large deviation theory for empirical distribution. We also note that distinction of their 
roles is indispensable for precise understanding of both the significance of the Neyman-Pearson lemma and that of empirical 
distributions. Even though the result of [23] has a certain significance from a viewpoint of hypothesis testing, its formulation 
does not precisely correspond to that of Sanov's theorem in the classical case. Finding a meaningful and useful quantum 
extension of Sanov's theorem is still a challenging open problem. See also Remark [8] below. 

VI. Tradeoff between the exponents of the first and second kind error probabilities 

In order to properly evaluate the tradeoff between the error exponents of the first and second kinds for the classical hypothesis 
testing problem, Han [3], [4] introduced the following quantity: 

Be(r\p\\#) 

d = sup { R | 3T G f, rj[f } > r and ([T] > R} (38) 

= sup {C[T] | r,[f] >r}, 
f 

where the subscript e is intended to mean that the quantity concerns exponents. Roughly speaking, the second kind error 
probability optimally tends to with the rate /3„[T„] ss e -nB e (r | p \\ a-) wnen tne fj rst error probability is required to tend to 
with a„[T„] e~ nr or faster. The same definition is applied to our setting including generalized and quantum hypothesis 
testing problem. We shall give some characterizations to this quantity in the sequel, extending the formula obtained by Han. 
The following lemma will play an essential role. 

Lemma 1: For any real number a and any sequence of tests T we have 

C(a) >min{C[T], a + ^T]}, and (39) 

r,(a)>mm{r)[T], -a + ([?}}. (40) 



In particular, for any a < b 



Proof: We have 



C(a)>a + ri(b) if ((a) < ((b), and (41) 
17(6) > -b + ((a) if rj(a)>ri(b). (42) 



= — lim sup — log (3 n {a) 

n — >oo Tl 



= - max j lim sup - log f3 n [T n ] , lim sup - log (f3 n (a) - f3 n [T n 



: min (c[f ], lim inf - - log f3 n (a) - f3 n [T n ] 

n— »oo n 



>min{C[f], a + r?[f]}, 
where the equality = follows from the formula 

lim sup- log (x n + y n ) 

= max I lim sup — log x n , lim sup — log y n 1 (43) 

^ 71 — ►oc Tl n— -'oo Tl J 

(ii) 

which is valid for any sequences of positive numbers {x n }, {y n }, and the inequality > follows from ( TBI . The inequality 
is thus proved. The proof of ( f40l > is similar and omitted. 

Theorem 3: For any r > 0, 

B e (r|£||*) = Bup{C(a)|2(a)>r} 

a 

= inf { a + rj(a) | 77(a) < r } 
= C(«o - 0) = a + rj(a + 0), 
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where 

a (e IU {-00,00}) 



= sup {a I 3a, 77(a) > r and £( a ') — C( a ) }■ 



Remark 5: Throughout the paper we use the notation 

/(a + 0) = lim f(a + e), /(a - 0) = lim /(a - e), 

eiO ej.0 

/(oo) = lim /(a), f(-oo) = lim /(a) 

a — >oo a — ' — 00 

for a monotone function / :K^lU{- 00, 00}. 

Remark 6: The formula given in the above theorem is valid even in the 'singular' case when the set {a 1 77(a) > r} is empty 
or the entire real line M. When the set is empty, we have B e (r | p|| <x) = —00, although this does not occur in the case when 
er„ are states (i.e., a n [I] — 1). When the set is K, we have B e (r \ p || <r) = 00. 

Proof of Theorem^ Since 77(a) = r\\S{a)\ and £(a) = ([S(a)], it is immediate from the definition d38l ) of £? e (r | p\\ a) 
that 



Also, we immediately have 



and 



B e (r I p || «?)> sup { C(o) I 77(a) >r}. (44) 

a 

sup{C(a) I ^(a) > r} 

a 

= sup { C( a ') I 3a, r/(a) > r and C( a ') — C( a ) } 

a' 

> C(a - 0) (45) 
ao + v( a o + 0) > inf { a + 77(a) | 77(a) < r }. (46) 



Next, for any sequence of tests T — {T n } satisfying 77 [T] > r and for any number a satisfying 77(a) < r, it follows from d40t 
that C[T] < a + 77(a), which proves 

B e (r\p\\a) < inf {a + 77(a) | 77(a) <r}. (47) 

Finally, let us show that 

C(ao-O) >a + 77(a + 0), (48) 

which, combined with (l44t through j47l ), completes the proof of the theorem. Assume first that the set {a 1 77(a) > r} is not 
empty nor K. Then, letting b be an arbitrary number satisfying 77(6) < r, it follows from (f42l > that 

sup { £(a) I 77(a) > r } < b + 77(0) < b + r < 00. 

a 

Therefore, invoking that lim^oo £(a) = 00 follows from (fT9l l. we see that ao is not 00 nor —00. Now, for an arbitrary e > 0, 
we have ((ao — e) < C(ao + s) by me definition of ao. Hence, from d4TT > of Lemma [T] we have 

C(a - e) > a - e + ij(a + e), 

which leads to d48l i. When {a \ 77(a) > r} = <j>, on the other hand, we have ao = —00, and (l48b is obvious since the right-hand 
side is —00. When {a 1 77(a) > r} = M, we have ao = 00 and, again, d48l is obvious since the left-hand side is 00. ■ 

Remark 7: The formula B e (r | p \\ a) = inf a { a + ?7(a) | 77(a) < r } for the classical hypothesis testing was derived by Han 
[3], [4], using a more complicated argument based on the technique of information-spectrum slicing. Note that this technique 
consists of a procedure of partitioning a set and does not straightforwardly apply to the quantum setting. Theorem 3 of [9] was 
also intended to give a general formula for the tradeoff between the error exponents of the first and second kinds, but its proof 
contains a gap, and the theorem does not apply to the general case where 77(a) and £(a) may be discontinuous functions; see 
Example 3.6 of [4]. 

Example 2: Let us consider the case when p and <r consist of pure quantum sates of the form p n = |^ n )(^ n | and 

&n = | ( y 3 n)( ( < 5 n| > where ipn and (p n are unit vectors in a Hilbert space 7i n for every n, and let S n = Tr (p n cr n ) = \ (ip n \ip n )\ 2 . 
In order to treat the hypothesis testing problem for this situation, we can assume with no loss of generality that 

lfr»= J and <p n = J^- , 
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for which we have the following spectral decomposition: 



e na a n = XiEi + X 2 E 2 



where 



Ai 



1 - e' ; 



+ r, A 2 



1 - e Tl 



l+e na -2e na 8„ 
2r 



E 2 = - 



1 - 



e"V<Ul-3„) 
r 

l+e""-2e" a <5„ 



l+e" a -2e" a <5„ 
2r 



1 + 



l+e" a -2e" a i5„ 



with 



This leads to {p n - e na cr„ > 0} = E 1 and 



dcf y/(l + e na ) 2 -4e na S„ 
r = — 



Tr ( Pn { Pn - e na a n > 0}) = i + ; + ^ e ^l^ 



Thus we have for every a > 



which yields 



lim Tr (p n {p n - e na a n > 0}) = 1 

n — ► oc 

lim Tr {p n {p n - e"V„ > 0}) = 



lim S n = 0, 

n — *oo 

lim S n = 1, 



£(p||<?) = j 

Furthermore, it is not difficult to see that for every a > 

2(a) = - 

and letting c denote this constant r\ (a), we have 

£e(r|p||<x) 



00 if lim 5 n = 0, 

n — >oo 

otherwise. 

if lim S n = 1, 

n — >oc 

00 otherwise. 



lim sup — log (5,„ 



00 if r < c, 
c if r > c. 



(49) 



Actually, the test = f p„ = |V'n)(V , n| satisfies a n [Tn^] — and (3 n \T^] = <5„ for all n and hence ^[T^] = 00 and 

C[T (1) ] = c, while the test = f er„ = |y> n }(¥>n| satisfies ry[T (2) ] = c and C[T (2) ] = 00 . Equation (g9]l means that it suffices 
to consider only these extreme tests when our concern is limited to the exponents of the error probabilities. In the i.i.d. case 
where 7i n = ip n = t/>® n and tp n = <p® n for distinct unit vectors ijj and tp, we have D(p \\ <r) — D(p\\ <x) = 00, which 
is also seen from D{p \ \ a) — 00 together with the argument of section fVl and c = — log |(V>|<v9)| 2 - 

Remark 8: In the classical i.i.d. case, it follows from Sanov's theorem and Cramer's theorem in large deviation theory that, 
for -D(a || p) < Va < D(p \\ a), 

r)(a) = rj(a)= min D(t\\ p) = max(6a - ip(6)), 

r : r [log p— log erj <a ftK 

where 7/1(6*) = logj^ p(x) 1+e a(x)~ e , and 

C(o)=C(a)= n min D(t \\ a) = a + rj(a). 

r : r[log p— log <r\>a 



13 



Applying these relations to Theorem [3] with some additional calculations, we can derive several single-letterized expression^] 
for B e (r | p || <r) (see [3], [4], [11], [13]), among which are 

B e (r\p\\a) = min D(t \\ a) 

t : L)(t II p) <r 

= max (l±3L±m. (50 ) 

-i<0<o 8 

In the quantum i.i.d. case, on the other hand, we have no explicit formulas for rj(a), £(a) and B e (r \ p || <x) at present; see 
[18] and [20] (section 3.4) for some partial result^. The mathematical difficulty arising in the study of B e (r \p\\a) is closely 
related to the absence of a "quantum large deviation theorem" applicable to 77(a) and £(a) (cf. Remark 0J. 

VII. Exponents of probability of correct testing 

Suppose that {er„} are states (i.e., <r n [I] = 1) and let r be a real number greater than B^(cr \\ p) = D(d \\ p). When the first 
kind error probability a„[T„] of a sequence of tests T tends to with a speed not slower than e~ nr , the second kind error 
probability /3 n [T n ] inevitably tends to 1. In this case, the speed at which the probability of correct testing 1 — f3 n [T n ] tends to 
can be regarded as a measure to evaluate "badness" of {T„}. Hence, it is meaningful to investigate the slowest convergence 
rate of 1 — /3„[T„] when a„[T„] is required to tend to with a„[T„] « e~ nr or faster. We are thus led to introduce the 
following quantity: 

s:(r|p||<T) d ^ f inf{C C [T] \r)[T]>r}, (51) 

T 

where 

C C [f ] - limsup {— log(l - p\[T n ])}. 

n — >oc 71 

Note that B*(r \ p\\ a) is defined for every r but is meaningless when r < B\<t \\ p) = D(a \\ p) since it vanishes for such 
an r. Han [3], [4] introduced B*(r \ p\\ a) for the classical hypothesis testing problem and characterized it as 

B* e (r I p || ff) = inf [a + V (a) + [r - »j(o)] + }, (52) 

where [t]+ =' max{f,0}, assuming the two conditions that the limit ij(a) d = lirrin^oo r/ n (a) exists for all a and that for any 
M there exists a K such that 



1 



lim inf log a n < — log — > K 

Pn 



1 



> M, 



or equivalently 

C c (-oo) = 00. (53) 

In this section we provide B*(r \ p\\ a) with a new characterization which needs no extra condition. Having in mind both 
applicability to source coding problems and consistency with the notation in [3], [4], we exchange the roles of p and <r assumed 
in (O and (0, so that 

< p n [I] < 00 and a n [I] = 1 (54) 

are now assumed. Accordingly, the definition of a„[T„] in (Q~|i is changed into a„[T n ] = p n [I — T n ] together with those of 
Vn[T n ], Vn(a), and 77(a). The arguments below are based on the inequalities $15[ and ([Tol l, which do not suffer from this 
change. 

Lemma 2: For any real number a and any sequence of tests T we have 



In particular, for any a < b 
Proof: We have 



C [T] > min{£ (a), a + r^T}}. (55) 
C>)>a + #) if C C (a)>C>). (56) 
C C [f] =limsup-ilog/3„ c [r n ] 



> lim sup -i log(/3„ c (a) + e~ na a n {T n }) 

> min |c C (a), a + 77[T]| , 



5 The expressions (the first one in (50) in particular) are often referred to as Hoefj ding's theorem after [24]. 
6 Note that B(r \p \\ a) in [20] corresponds to our B e (r \ <j || p). 
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where the first inequality follows from ([16} and the second inequality follows from the general formula (cf. (l43T>): 

liminf-log(a; n +y n ) 

n — >oo TL 

< max < lim inf — log x n , lim sup — log y n \ . (57) 

I iwoo n n->oo n J 

■ 

Theorem 4: For any real number r, we have 

B*(r\p\\a) = B* eil (r\p\\a) (58) 
= sup min{( c (a), r + a} (59) 

a 

= inf max{£ c (a), r + a} (60) 

a 

= r + a5, (61) 

where 

B* el {r\p\\B) d = inf{C C [T] | r, n [T n ] > r (Vn)}, (62) 

T 

ag ^ f sup { a | C C ( a ) ^ a > r} — inf { a | £ C ( a ) — a < r }■ (63) 
Remark 9: Note that > and < in the definition of ciq can be replaced with > and <, respectively, because ( (a) — a is a 
strictly decreasing function. 

Proof: Suppose that a sequence of tests T = {T n } satisfies r)[T] > r. It then follows from ( 1331 ) that 



C C [f] >supminfc C (a), 7?[f] + a} 

> supmin < £ C (a), r + a [ 

> sup {r + a \ ( (a) > r + a } = r + <2q , 

a 

which proves 

B*(r\p\\ a) > sup min {( C (a), r + a} > r + a* . (64) 

a 

Next, we show that, for arbitrarily given r, a and n, there exists a test T„ satisfying 

J7» [T n ] > r and Cn \ T n] < max {C n », r + a}. (65) 

When rj n {a) > r, it is obvious that T n d = S n (a) satisfies this condition. When rj n (a) < r, let 

T n = 1 - e-^-^W) S£(a) 

=S n (a) + {1 - e -"(»-^(«))} S„ c (a). (66) 

In other words, T n is a randomized test which rejects the hypothesis p n with probability e~ n ( r ~' 7n when, and only when, 
the test S n (a) rejects p n . Then we have 

a n [T n ] =p n [e- n ( r -^ a »SZ(a)] 

= e -< r -^ a »a n {a) = e~ nr 

and 

/3„ C [T„] ^a n [e-^-^S^(a)} 

=e -n[r-r, n (a)) > g-n(r+o) ) 

where the last inequality follows from d!5t . These are rewritten as ry n [T„] = r and £nPn] < r + a, and imply d65l l. 

We have thus shown that for every r and a there exists a sequence of tests T = {T n } such that r/ n [T n ] > r for every n and 

C C [T] < limsup max{C^(a), r + a} 

n — >oo 

= max{£ (a), r + a}, 
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which leads to 

B* 1 (r | p || <x) < inf max{C C (a), r + a} 

' a 

< inf {r + a \ ( c (a) < r + a} = r + a* . (67) 

a 

Since B*(r \ p\\ <x) < B* 1 (r | p \ \ cr) is obvious, this together with d64t completes the proof. ■ 
Remark 10: The proof of lemma [2] can be modified, using d43l instead of ( 1571 ), to yield 

C c [f] >min {C c (a), a + ??[f]}, (68) 

whereby we can similarly show that 

BHr\p\\a) = Bl l {r\p\\a) (69) 

= sup min{C c (a), r + a} (70) 

a 

= inf max{( c (a), r + a} (71) 

a — 

= r + a , 

where 



(72) 



B e *(r I p || «t) d =l £ inf { C c [f ] | r,[T] > r}, (73) 

T — _ 

B* e ,i(r | p || <?) = inf { C C [T] | r/„[T„] > r (Vn)}, (74) 

T 

and 

d = sup { a | C c (a) - a > r } = inf { a | ( c {a) - a < r }. (75) 

Now let us demonstrate how Han's formula (152b is derived from Theorem [4] 
Lemma 3: It always holds that 

C C (a) < a+7j(a) (76) 

for any a, and if a is a decreasing point of C,° in the sense that ( c (a — e) > ( C (a + e) for any e > 0, then we also have 

C C (a + 0) > a + 7?(a + 0). (77) 
Proof: The inequalities are immediate from ( fT3T > and ( [56b , respectively. ■ 
Corollary 1: It always holds that 

S*(r|p||<x) < inf (a + max {rj(a), r}j 

= inf{ a + 77( a ) + [r-ry( a )] + }, (78) 

while we have 

B*(r\p\\a) > inf I a + max {rj(a), r} 



= inf ja + r/(a) + [r -ry(a)] + j (79) 

if 

r < C C (-oo) - sup {a | CVoo) = C C (a) }• (80) 
In particular, if the limit 77(a) d == limn^oo rj n (a) exists for all a and if 

C C (-«3)-(X3 or { a |C c (-cx3) =C C (a)} = 0, (81) 

then Han's formula ( 1521 is valid for all r. 

Proof: Since the first inequality is immediate from Theorem|4]and d76*i >. we only prove the second one. Invoking Theorem|4] 
again, it suffices to show that, for any r satisfying (f80b . 

inf max {£ c (a), r + a} > inf ( a + max {77(a), r} ) . (82) 

a a V — / 

Define 

bo = sup{a|C C (-oo) = ( C (a)}, 

including the case when {a \ ( c (—oo) = (^ C (a) } = (j> and bo = —00, and let a be an arbitrary number satisfying a > bo . Then 
we have £ (—00) > £ (a), which means that there exists a number b such that b < a and £ (6) > £ (a). The supremum of 
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such numbers b, denoted by b = b(a), satisfies b < a and ( c (b ~ e) > ( C (a) > C C ( + e) for every e > 0, which implies that 
b is a decreasing point of ( c and £ (a) > C (5+0). Hence we have 

max{( C (o), r + a} > max{( C (6 + 0), r + b} 

> max {77(6 + 0) + b, r + b} 

= + £ + max {77(6 + s), r}j 

> inf ^6 + max {rj(b), r}^j , 
where the second inequality follows from (|77T >. and therefore 

inf max{C (a), r + a} > inf ( a + max {77(a), r} ) . (83) 

a : a>bo a \ — / 

This proves d82l when &o = —00. In the case when 60 > —00, we have 



inf max{C {a),r + a}> inf £ (a) 

a : a<6o a : a<bo 



C C (-oo) 

> max{C C (6 ), r + 6 }, (84) 
where the last inequality follows from the nonincreasing property of £ c and (f80b , and in addition we have 

max{C c (& ), r + 60} > max{( c (fo + 0), r + b } 

= limmax{C (bo + e), r + bo + e} 

eio 

> inf max{( C (a), r + a}. (85) 

a : a>£>o 

Now the desired inequality d82b follows from d83l . d84b and d85l l. ■ 
Remark 11: The first condition in ( TSTb in terms of £ is weaker than the original condition 03] ) in terms of £ c . The second 

condition in ( T8Tb means that for any number a there always exists a number b < a such that £ (6) > £ (a). The second is 

not implied by the first, since there may be a number a such that ( (b) = 00 for all b > a. 
Remark 12: It is easy to see that if the condition (f80b is not satisfied then 

B* e {r\p\\a) = C(-^). 

Example 3: Consider the following classical hypothesis testing problem: X n = {xq,xi} for all n on which probability 
distributions p n and a n are defined by 

Pn(x Q ) =e"™S Pn ( Xl ) = 1-e-" 6 ", (86) 
a„(x ) = e-™ c , o- n (a:i) = 1 - e~ nc , (87) 

where &„ is a positive sequence obeying lim^oo b n = 00 and c is a positive constant. Then the limits 77(a) = linin^oo 77 n (a) 
and ( c (a) = lim„^oo C,i( a ) exist for all a and satisfy 

, x f if a > , N f if a > 

77(a) = <^ and ( a = ., . n 

v ' I 00 if a < v ' ^cifa<0, 

where we have chosen S n (a) = {p n — e na <j n > 0} in Note also that D_(B || p) = D(a \\ p) = 0. It is then immediate 
from Theorem l4l that 

{c if r > c 
r if < r < c 
if r < 0, 

while we have 

r if r > 



inf ja + 77(a) + [r — r/(a)]_)_ | 



if r < 0. 



Since £ c (— 00) = c and sup {a | £ c (— 00) = C c (a)} = 0, the condition ( f80b for validity of Han's formula turns out to be 
r < c, which just explains the above situation. 

Remark 13: In the classical i.i.d. case, Han and Kobayashi [25] (see also [26]) obtained a compact expression for B* (r\p\\ <?) 
in the form 

B* e {r\p\\a)= min {D(t || a) + r - D(t || p)} (88) 

r : D(t II p)<r 
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with noting that the RHS can be represented as min D(r || a) when r is sufficiently near Z)(cr || p). We also have an 

t : D(r \\ p)>r 

expression in the forrrQ 

B e( r \p\\°-) = max g 1 (89) 

where i/j(9) — ^og^2 x p(x) 1+ a(x)~ is the same function as defined in Remark [8] These expressions can be derived by 
applying large deviation theorems to Theorem [4] (or to (l52l as in [3], [4]) (cf. Remark |H). For the quantum i.i.d. case, it was 
shown in [21] that inequality (1331 . with p and a exchanged, yields a lower bound on B*(r | p || a) in the same form as the 
RHS of d89l except that the range of max is restricted to —2 < 9 < — 1 (; see [22] for a simple derivation). This restriction 
has been relaxed to 9 < — 1 just as d89l by [27] and [20] (section 3.4fl Some further results on B*(r \ p\\ a) are also found 
in [20] including a quantum extension of d89l (not a bound but an identity) in terms of a variant of ^(9) which is defined in 
a limiting formj . 

Before concluding this section, we introduce the dual of B*(r\p\\ <x) by 

B* e * (r | p || a) ^ sup {rAT] I C C [T]<r} 
f 

= sup{r'| J B c :(r , |p||a : ) < r} (90) 

and provide this with a general characterization, which will be applied to the source coding problem in the next section. 
Theorem 5: We have 

Bl*{r\p\\a)=r-a* Q *, (91) 

where 

a** d =mf{a\( c (a) < r} = sup{a|C C (a) > r}. (92) 
Proof: Obvious from the following equivalence: 

Be ( r> I P II ^ r & sup min {^ C (o), r' + a} < r 



<^ Va 
r' < r 



C»<r) 



VIII. Application to classical fixed-length source coding 

Let p = {pnj^Li be a classical general source; i.e., let p n be a probability distribution Px« on a finite or countably infinite 
set X n for each n. A (possibly stochastic) fixed-length coding system for this source is generally represented by a sequence 
<I> = {3> n }5£Li °f ®n — (yn, F n , G n ), where y„ is a finite set, F n = {F n (y \ x)} is a channel from X n to y n representing 
an encoder, and G n = {G n (x \y)} is a channel from y n to X n representing a decoder. The size and the error probability of 
$„ are respectively defined by |$„| ^ f \y n \ and 

7n[*n]= f l- H Pn(x)F n (y\x)G n (x\y). 

xex™ yey n 

Now, let a n be the counting measure on X n ; i.e., a n [T n ] — J^xex™ F n {x). Then the source coding problem for p„ can be 
reduced to the generalized hypothesis testing problem for {p n ,a n } as follows. For an arbitrary coding system $„, a test T n 
is defined by 

T n (x) = F n (y\x)G n {x\y), (93) 

which satisfies 

In [$„] = l-Pn [T n ] = «„ [T„] (94) 

and 

|*„| =|3>f.| = E^EE ^(yl^G^xly) 

=a n [T n ] =p\[T n ]. (95) 

7 To the authors' knowledge, this type of expression for B* first appeared in [21] even for the classical case. 
8 Note that B*(r \p \\ a) in [20] corresponds to our B* (r | <f || p). 
'This result has not appeared in the original Japanese edition of [20]. 
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Conversely, for an arbitrary deterministic (i.e., {0, l}-valued) test T n we can construct a coding system $„ = (y n , F n ,G n ) 
satisfying 7n [$ n ] = a n [T n ] and |$„| = /3 n [T n ] by setting y n = {x G X n \ T n (x) = 1} and F n (y\x) = G n (x\y) = 1 if 
y — x £ y n . Noting that the direct (achievability) parts of Theorems Q] and [3] have been shown by using only deterministic 
tests (by setting S n (a) to be {p n - e na a n > 0} or {p n - e na a n > 0}) and that limsup,^^ ilog|$„| = — £[T] when 
| | = (3 n [T n ], we immediately obtain the following identities: 
Theorem 6: We have 

R(e | p) =' inf { limsup - log |$„| | limsup7„[$„] < e} 

= -B{e\p\\a) = -D{e\p\\a) (96) 
= inf {a | limsupPx«{— logPx»(X n ) ( > a} < e], (97) 

n— »oo Tl 

P f (e | p] d = inf { limsup - log |$ n | | liminf 7„[$„] < e} 

<E> n — >oo Tl 71 >OC 

= -B\e\p\\ct) = -D{e\p\\ct) (98) 
= inf {a I liminf Px>{ logPy«(X n ) > a} < e\, (99) 

R(p) d = f R(0\p) 

= inf {R | 3*, limsup - log |$ n | < R and lim 7„[$„] = 0} 

= - J B(p||CT) = -D(p||^) (100) 
= d = p- limsup {-ilogP x »(X")}, (101) 



and 



RHp)^RHl\p) 

= sup{P| V$, if limsup - log | $„ | < R then lim 7„[$„] = 1} 

= -B\p\\B) = -D{p\\a) (102) 
= H{p) = p-limM {--logP X n(X n )}, (103) 

n— >oc ft 

R e (r\p) d = inf { limsup - log |$„| I liminf{-i log 7n [$ n ]} > r} 

* rwoo « n-»oo n 

= -B e (r|£||£) (104) 
= sup {a — a(a) \ g_(a) < r}, (105) 



where 



= lim inf - - log P X n\— log (X") > a}. (106) 

n— >oo n n 

Remark 14: R(p) is the optimal compression rate with asymptotically vanishing error probability and Equation ( 11011 ), 
which was originally shown in [1], means that it always equals the spectral sup-entropy rate H(p). The source p is said to 
have the strong converse property when limsup^^ ilog|$„| < R{p) implies lim^oo 7 „ [$„] = 1, or equivalently when 
R(p) = Rj(p). As was pointed out in [1], this property is equivalent to H(p) = H_(p), which is now obvious from ( 11011 ) 
and (1103b . Equation ( |97| i is found in [28], and (1105b in [5]. Although the use of the symbol a for different notions, for the 
counting measures and for the function (1106b . may be a little confusing, it will be helpful for comparing our results with those 
of [5]. 

Next, let us turn to the following quantity: 

K(r\p) 

d f 1 1 

= inf {limsup-log|$ n ||limsup{ log(l - -y n [$„])} <r}. (107) 

<I> n — >oo fl 'd — * oo Tl 



19 



Han [3], [5] proved that for any r > 

R* e (r | p) = inf \h>0\ inf {<r*(o) + [a - a* (a) - h} + } < r] (108) 

a 

under the assumption that the following limit exists for all a: 

a*(a) d ^ lim -- logP X " {-- log Fx- (X n ) < a}, 
n— >oo n n 

A general formula for R* (r \ p) which needs no additional assumption is given below. 
Theorem 7: For any r > we have 

R* e (r\p) =max{& -r, 0}, (109) 



where 



and 



Proof: Since 



6o = sup {a | a * (a) > r} = inf {a\a*(a) < r} 



if* (a) d = limsup — logP X n{— logP x „(X") ^ a}. 
CT*(a) = limsup -- \ogp n [{p n - e~ na a n > 0}] = rj c (-a) 

n — >oo Tl 



it follows from Theorem [5] with p and cr exchanged, that 

B* e *(r\cf\\p) = r-b . 

Hence ( 1109t is equivalent to 

R*(r\p) = max{-B:*(r\a\\p), 0}. (110) 

Here it is easy to see LHS > RHS from d94] >. ( |95] i and 

- Br (r | ct || /5) = - sup { C[T] I V C [T] < r} 
f 

= inf {limsup-log^ n [T n ] | limsup -- log (1 - a n [T n }) < r}. 

T n— >oo n—>oo Tl 

Let us show the converse inequality LHS < RHS. Since 

-BT{r \a\\p)= inf {r> \ B* e (-r' \a\\p)< r}, 

it is sufficient to show that 

K(r \p)<r' if B* e (-r' \a\\p)<r and r' > 0. 
This is equivalent to the proposition that for any r' > there exists a sequence of codes such that 

limsup — log |$„| < r', and 

n — >oo Tl 

limsup-- log(l - 7n [r„]) < B* e (-r' \ a \\ p). 

n^-oo Tl 

This proposition follows if the infimum of 

B*{-r' \a\\p)= inf {rj c [f} | C[T] > -r'} 

T _ 

can be attained by a sequence of deterministic tests T when r' > 0. Recalling the proof of Theorem |4] and applying it to 
the present situation, it suffices to show that for any a S K, r' > 0, 6 > and for any sufficiently large n there exists a 
deterministic test satisfying 

(n[T n }>-r'-6 and r? n c [T n ] < max {< (a), -r' - a}, (111) 

which corresponds to (l65l l with a slight modification. Let S n (a) be chosen to be deterministic in © and identify it with its 
acceptance region (e.g., S n (a) = {x € Af" | jO n (a;) — e na a n (x) > 0}). It is then obvious that T n = 5„(a) satisfies d 1 1 U if 
Cn( a ) > — r '- Suppose Cn(a) < —T 4 ', which means a n [S n (a)] — \S n (a)\ > e nr . Then there exists a subset T n C S n (a), which 
is regarded as a deterministic test, satisfying 

a[T n ] = \T n \ = \e nr 1 and p n [T n ] > p n [S n (a)}. 
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Using © we have p n [T n ] > e n( - a+r '\ and T n satisfies (ITTTb - ■ 
Now we can see that Han's formula ( 1108t is very near to the true general formula. Actually, if a* (a) is simply replaced 
with <7*(a), it becomes equivalent to ( |109t as follows. Noting that cf*(a) is monotonically nonincreasing, we have 

mi{a*(a) + [a-a*{a)-h}+} <r 

a 

inf max {a* (a), a — h} < r 

a 

<^ V<5 > 0, 3a, a* (a) <r + 5 and a - h < r + 5 

h> sup inf {a - (r + S) \ a*(a) < r + 5} 
s>o a 

<^ h> sup sup {a - (r + S) \ a*(a) > r + 5} 

8>Q a 

<^ h > sup {a - r \ 38 > 0, a*(a) > r + 8} 

a 

h > sup {a — r \ a* (a) > r } = bo — r, 

a 

and therefore 

inf {ft > 0| inf{cr*(a) + \a-a*(a) - h} + \ < r\ 
= max {&o — r, 0}. 

Remark 15: Iriyama (and Ihara) [29], [30] obtained other forms of general formulas for R e (r\p) and R*(r\p) from a 
different point of view. 

Remark 16: Although we have treated only classical source coding here, extension to some quantum settings is actually 
possible. Of the two major coding schemes proposed for the quantum pure sate source coding, namely visible coding and 
blind coding, the former is less restrictive and hence needs in general a more careful or stronger argument than the latter when 
showing the converse part of a theorem concerning a limit on all possible codes. The situation is reversed when showing the 
direct (achievability) part. It is easy to see that the direct parts of Theorems [6] and [7] are straightforwardly extended to visible 
coding, and that the direct part only of the arguments concerning R(p) and R e (r\p) in Theorems [6] is applicable to blind 
coding, while it is not clear whether other bounds in Theorems [6] and [7] are achievable for blind coding. On the other hand, it 
has been shown in [31] that the inequality 

7«[$n] + e na |$„| > P n[{Pn - e"<V„ < 0}], 

which follows from (fT4l) . d94l > and d95l) . can be extended to visible coding just in the same form. Since the converse parts of 
our theorems are direct consequences of this inequality, they are extended to visible coding, and hence to blind coding as well. 
We thus have the same formula as Theorem [6] for both visible and blind coding, and Theorem |7] for visible coding. See [31] 
for details. Hayashi [32] showed that these values R^(e\p), R'(p), R e (r\p) have other operational meaning. He also treated 
these values when the quantum information source is given by the thermal state of Hamiltonian with interaction. That is, using 
this discussion, we can treat the bounds R^(e\p), R^(p), R e (r\p) in this case. 

Remark 17: Recently, Hayashi[33] clarified the relation between R(e\p) and R^(e\p) from a wider view point. 

IX. Concluding remarks 

We have demonstrated that the information-spectrum analysis made by Han for the classical hypothesis testing for simple 
hypotheses, together with the fixed-length source coding, can be naturally extended to a unifying framework including both 
the classical and quantum generalized hypothesis testing. The generality of theorems and the simplicity of proofs have been 
thoroughly pursued and have yielded some improvements of the original classical results. 

The significance of our results for quantum information theory is not so clear at present, since our knowledge of the asymptotic 
behavior of the quantum information spectrum p n [{pn — e na cr n > 0}] = Tr (p n {p n — e na cr n > 0}) is insufficient even for the 
i.i.d. case p n = p® n , a n — a® n . Therefore we cannot obtain compact and computable representations of information-spectrum 
quantities. Nevertheless, the fact that the asymptotic characteristics of quantum hypothesis testing are represented in terms 
of the information spectrum seems to suggest the importance of studying quantum information theory from the information- 
spectrum viewpoint. An attempt in this direction is found in [19], where a similar approach to [2] is made for the general 
(classical-)quantum channels. As an application the capacity formula for quantum stationary memoryless channels [34], [35] 
is provided with a new simple proof by linking it to the quantum Stein's lemma via our Theorem |2] 

Finally, we mention some remarkable progresses in related subjects reported after submitting the accepted version of the 
present paper. The quantum Chernoff bound for symmetric Bayesian discrimination of two i.i.d. states has been established by 
[36] and [37]. Based on an inequality shown in [37], it has been proved by [38] that B e (r \ p || a) for the quantum i.i.d. case 
satisfies (cf. equation d50l )) 

B e (r p <t)> max ^ ' ^ J , 

-1<6<0 9 



21 



where ip(6) = logTr p 1+e a 6 . This is the tightest lower bound on B e (r | p || <x) of those obtained so far. Moreover it seems 
natural to conjecture that the bound achieves the equality in general. 
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